Orphanhood in Colombia

Mortality and Fertility rates

The mortality and fertility rates are computed based on the populations estimates from 1998 to 2021. Such population estimates are based on the census of 2005 and 2018, such that the remaining years were obtained by linearly interpolating (and extrapolating) these two data sources at the desired spatial resolution.

However, specially for fine resolutions (e.g., at municipality level), the population estimates might be inaccurate (so are the mortality and fertility rates). This happens more often for estimates before 2005.

Notably, for the following observed issues, we proceeded as detailed below.

REMARK: The total number of rows is, at most, \(511,632\); i.e., \(24\) years, \(1,122\) municipalities, and \(19\) age groups (\(9\) age groups for women and \(10\) age groups for men). When working with fertility data, there are \(8\) age groups for women and \(10\) age groups for men.


Problem #1

Description: Negative population estimate (from the linear interpolation method).

Solution: Replace the negative population estimates with later year estimates (if the estimates are always negative, set the population size to 0).

Population table: we modified \(1,547\) rows (out of \(511,632\), i.e., \(\approx 0.30\%\)).


Problem #2

Description: Impossible population estimates with respect to the number of deaths (e.g., the “number of deaths” estimates are larger than the population estimates for some strata).

Solution: Treat the impossible values as missing data (i.e., NA), and use some imputation technique to deal with these cases. In particular, we can use the mean (or median) of year + 1 and year - 1.

Population table: we modified \(27\) rows (out of \(511,632\), i.e., \(\approx 0.01\%\)).


Problem #3

Description: Impossible population estimates with respect to the number of births (e.g., the number of births is non-zero while the population is zero).

Solution: Same as in Problem #2.

Population table: we modified \(138\) rows (out of \(484,704\), i.e., \(\approx 0.03\%\)).


Problem #4

Description: Unlikely population estimates with respect to the number of deaths.

Solution: Set the population size such that the mortality rate is the (lower or upper) limit not to be considered an outlier. To determine the threshold defining an outlier, we analyze the variation of the corresponding time series (say, \(\text{mean} \pm 3\times\text{sd}\)) over a pre-defined time period; in particular, we considered the interpolated (not extrapolated) interval—i.e., 2005-2018.

Population table: we modified \(17,874\) rows (out of \(511,632\), i.e., \(\approx 3.49\%\)).


Problem #5

Description: Unlikely population estimates with respect to the number of births (e.g., the number of births are 5+ times the number population size for some strata).

Solution: Same as in Problem #4.

Population table: we modified \(19,942\) rows (out of \(484,704\), i.e., \(\approx 4.11\%\)).


Problem #6

Description: After processing the data as in Problems #1-5, we may still spot some outliers for specific municipalities and strata.

Solution: To overcome this problem, we (once again) identify these values and replace them with NA. As before, \(x_i\) is an outlier if it does not fall within \(\text{mean}(\mathbf{x}) \pm 3\times\text{sd}(\mathbf{x})\). The imputation procedure is also based on the \(n^{\text{th}}\)-order neighbors (\(n = 1\)); i.e., we replace it by the mean (or median) of the neighbors’ rates.

Population table:

  • Mortality: we modified \(2,520\) rows (out of \(511,632\), i.e., \(\approx 0.49\%\)).


  • Fertility: we modified \(8,922\) rows (out of \(484,704\), i.e., \(\approx 0.57\%\)).

Problem #7

Description: As the corrections were made independently for the mortality and fertility rates, the estimated population in these two groups may not be the same for all combinations of municipality, gender, and age group.

Solution: To correct this, we set to NA the population and X_rate columns for all rows where there is a missing value in either mortality_rates or fertility_rates. For imputation, after the corrections from the previous steps, we average the population estimates for the two data sets and re-compute the rates accordingly.

Population table:

  • Mortality: we modified \(36,121\) rows (out of \(511,632\), i.e., \(\approx 7.06\%\)).


  • Fertility: we modified \(36,121\) rows (out of \(484,704\), i.e., \(\approx 7.45\%\)).


These are the final total population estimates. As a remark, the aforementioned correction process was made only for the age groups 10+; i.e., the population estimates for individuals 0-9 were kept as original (as we are not using them when estimating the mortality and fertility rates).


When aggregating the data over the municipalities, the yearly Total population estimates are as follows.

Results

Now, we will analyse the the mortality rates and fertility rates before (Old) and after (New) processing the population data (as per the above procedure).


Figures below show the time series for the mortality and fertility counts (and rates) of female and male individuals in the 25-29 age group in 40 randomly selected municipalities.

Mortality Female

Old New

Mortality Male

Old New

Fertility Female

Old New

Fertility Male

Old New

Next, we show the estimated mortality and fertility rates (mean and standard deviation) of female and male individuals in all age groups and municipalities.

Mortality Female

Old New

Mortality Male

Old New

Fertility Female

Old New

Fertility Male

Old New